# Efficient Multi-stage Inference on Tabular Data

## Setup
### If using docker:
`docker build . -t myimage`
`docker container run -it myimage /bin/bash`
This will install all of the python packages, so the setup step is done.

### Otherwise install python (version 3.8) packages:
`pip install -r requirements.txt`

## Reproducing results
### Generate all results
Run `./all.sh` to verify Table 1, Table 2, and Figure 7.

### Table 1
Run `./table1.sh` to verify all of Table 1.
Run `python3 evaluate_individual_models.py datasetname` to verify individual rows (e.g.`python3 evaluate_individual_models.py banknote`).

### Table 2
Run `./table2.sh` to verify all of Table 2.
Run `python3 evaluate_hybrid_model.py datasetname` to verify individual rows (e.g.`python3 evaluate_hybrid_model.py banknote`).

### Figure 7
Run `./figure7.sh` to verify Figure 7.
Run `python3 evaluate_hybrid_model.py datasetname --make_plots` to verify the individual datasets from Figure 7 (e.g.`python3 evaluate_hybrid_model.py banknote --make_plots`).
Figures are generated in the output directory.

### Other figures/tables
Unfortunately, some results, such as the latency test in the production environment, cannot be easily replicated since it requires access to the production RPC calls.

## Finding hyperparameters
This is not necessary to reproduce the results since there are hyperparameters already stored in the hyperparameters directory.
The hyperparameters are chosen based on the best ROCAUC score.
Run `python3 save_hyperparams.py datasetname` (e.g.`python3 save_hyperparams.py banknote`).

## Viewing hyperparameters
In order to see the saved hyperparameters that we use in the paper, run `python3 print_hyperparams.py path/to/hyperparameter/pickle/file`. For example, for the banknote model, run `python3 print_hyperparams.py hyperparameters/banknote.p`.

## Code Layout
The public datasets are stored in the `data` directory. 
The hyperparameters are stored in the `hyperparameter` directory.
The LRwBins model is found in the `models/LRBinsModel.py` file.
The figures get saved to the `output` directory.
Reproducing code (as seen above) is stored in the root directory.
The `utils.py` file helps import the datasets for the reproducibility code.